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^-H Abstract 



o 






Diffraction calculations, such as the angular spectrum method, and Fresnel diffractions, are used for calculating scalar light prop- 
agation. The calculations are used in wide-ranging optics fields: for example, computer generated holograms (CGHs), digital 
holography, diffractive optical elements, microscopy, image encryption and decryption, three-dimensional analysis for optical de- 
vices and so on. However, increasing demands made by large-scale diffraction calculations have rendered the computational power 
of recent computers insufficient. We have already developed a numerical library for diffraction calculations using a graphic pro- 
jcessing unit (GPU), which was named the GWO library. However, this GWO library is not user- friendly, since it is based on C 
language and was also run only on a GPU. In this paper, we develop a new C-h-h class library for diffraction and CGH calculations, 
which is referred as to a CWO-h-h library, running on a CPU and GPU. We also describe the structure, performance, and usage 
^ ^examples of the CWO-h-h library. 
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^^1. Introduction 

O I Scalar light propagation is calculated using several diffrac- 
•— '.tion calculations. These calculations, e.g. the angular spectrum 
^_^ method and Fresnel diffractions, are used in wide-ranging op- 

^ tics fields lllllj], ultrasonic [3], X-ray |0] and so on. In op- 
OO tics, its applications include Computer Generated Holograms 



in 

o 
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(CGH), digital holography, phase retrieval, image encryption 
and decryption, steganography, three-dimensional (3D) analy- 
sis for optical devices, Diffractive Optical Elements (DOE), and 
'so on. 

In CGH and digital holography, diffraction calculations are 
Vital. CGH are generated by calculating a diffraction calcula- 
tion from a 3D object to a hologram plane on a computer. If we 
apply CGH to a 3D display technique Il5l|6|], the technique be- 
comes attractive because a wavefront reconstructed from CGH 
is almost equivalent to an object light. However, the computa- 
tional time required for the diffraction calculation involved in 
CGH hampers the realization of a practical 3D display using 
CGH. Many methods have thus been proposed for accelerating 
the computational time |7, 8, 9,[TQi[Tlll. 

Digital holography is a well-known method for electroni- 
cally recording existing 3D object information on a hologram, 
which is captured by a Charge Coupled Device (CCD) and Com- 
plementary Metal Oxide Semiconductor (CMOS) cameras 1 12, 
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1311 . To reconstruct 3D object information from a hologram in a 
computer, we need to calculate diffraction calculations. Appli- 
cations of digital holography include Digital Holographic Mi- 
croscopy (DHM) QUI], Digital Holographic Particle Track- 
ing Velocimetry (DHPIV) (m lvh and so forth. 

Phase retrieval algorithm retrieves phase information of an 
object light from intensity patterns captured by CCD camera. 
In optics, Gerchberg-Saxton (GS) algorithm llisll and modified 
algorithms, e.g. Fresnel ping-pong lll9ll and Yang-Gu jlw algo- 
rithms, are widely used for phase retrieval. The algorithms are 
based on iterative optimization: namely, they gradually retrieve 
phase information by calculating diffraction calculations be- 
tween certain intensity patterns (normally more than two) while 
subject to amplitude and phase constraints. The applications of 
the algorithms include, for example, wavefront reconstruction 
imEB, holographic projection HBIMIlTilliilll and 
so on. 

Diffraction based encryption and decryption ISO, [3ll [3211 . 
and steganography [33] were proposed . Diffraction-based tech- 
niques have an interesting feature, and handle not only 2D but 
also 3D images. 

In 3D analysis for optical devices, such as optical Micro 
Electro Mechanical Systems (MEMS), DOE and so on, we can 
obtain 3D light distribution to stack multiple two-dimensional 
(2D) diffraction calculations along depth-direction ll34ll . For ex- 
ample, several research works have analyzed the optical charac- 
teristics of a Digital Micro-mirror Device (DMD), which is one 
of the MEMS devices, using Fresnel diffraction and the angular 
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spectrum method ||35l |36 

As mentioned, diffraction calculations are practically used 
in wide-ranging optics fields. The former can also accelerate 
computational time using the Fast Fourier Transform (FFT) al- 
gorithm; however, if we wish to realize real-time 3D recon- 
struction from holograms in digital holography, generate a large- 
area CGH, and obtain 3D light field from an optical device, re- 
cent computers lack sufficient computational power. 

Using hardware is an effective means to further boost com- 
putational speed for CGH and diffraction calculations. In fact, 
we showed dramatically increased computational power to de- 
sign and build special-purpose computers for CGH targeting a 
3D display and named HOlographic ReconstructioN (HORN), 
in order to overcome the computational cost of CGH. The HORN 
computers designed by pipeline architecture can calculate light 



intensities on CGH at high speed ||37l |38l S H iH HI • The 
HORN computers were implemented on a Field Programmable 
Gate Array (FPGA) board, except HORN-1 and -2. To date, we 
have constructed six HORN computers, which have been able 
to attain several thousand times the computational speed of re- 
cent computers. Researchers also developed a special-purpose 
computer, FFT-HORN, in order to accelerate Fresnel diffraction 
in DHPIV EH. The FFT-HORN was able to reconstruct 
256 X 256 images from holograms, which were captured by a 
DHPIV optical system, at a rate of 30 frames per second. The 
FPGA-based approaches for both CGH and diffraction calcu- 
lations showed excellent computational speed, but are subject 
to the following restrictions: the high cost of developing the 
FPGA board, long development term, long compilation of the 
hardware description language and mapping to FPGA times, 
and technical know-how required for the FPGA technology. 

Conversely, recent GPUs allow us to use as a highly paral- 
lel processor, because the GPUs have many simple processors, 
which can process 32- or 64-bit floating-point addition, multi- 
plication and multiply-add instructions. The approach of accel- 
erating numerical simulations using a GPU chip is referred to as 
General-Purpose computation on GPU (GPGPU) or GPU com- 
puting. The merits of GPGPU include its high computational 
power, the low cost of the GPU board, short compilation time, 
and the short development term. 

We have already developed a numerical library for diffrac- 
tion calculations using a GPU, which was named the GWO 
library 11441] . The purpose of the GWO library is to facilitate 
access to GPU calculation power for optics engineers and re- 
searchers lacking GPGPU. The GWO library has already been 
distributed via the Internet and used to report some papers. For 
example, we reported on a real-time DHM system ll45ll . diff'rac- 
tion calculations in a computer-aided design tool for develop- 
ing a holographic display 11461]. a fast CGH calculation ll47[|48ll 
and a DHM observable in multi-view and multi-resolution 11491] . 
Moreover, researchers studied Airy beams generation and their 
propagation feature in the simulation using the GWO library 



5Ql bin . However, the GWO library is not user- friendly be- 



cause it is based on C language, e.g. the library user must man- 
age the CPU and GPU memory allocation personally and so 
on. In addition, the library is run on only a GPU, namely the 
diffraction and CGH calculations are not calculated on a CPU. 



In this paper, we develop a C-h-h class library for computa- 
tional wave optics involved in diffraction and CGH calculations, 
which is referred to as a CWO-h-h library, running on CPU and 
GPU. The CWO-h-h library, unlike the GWO library, is devel- 
oped using C-h-h and its structure, performance, and usage ex- 
amples are described. 

In Section 2, we describe diffraction calculations, the struc- 
ture of the CWO-h-h library, the class "CWO" for diff'raction 
calculations, and the subclass "cwoPLS" for Point Light Source 
(PLS)-based diffraction and CGH calculations on a CPU. In 
Section 3, we describe the "GWO" and "gwoPLS" classes for 
diffraction and CGH calculations on a GPU. In Section 4, we 
describe the implementation of the "CWO" and "GWO" classes. 
In Section 5, we describe field types which are held in the 
classes. In section 6, we show the performance of diffraction 
and CGH calculations on a CPU and GPU. In Section 7, we 
show the applications of the CWO-h-h library to holography. In 
Section 8, we conclude this work. 

2. Detail of the CWO++ library 

The CWO-h-h library mainly consists of two C-h-h classes: 
CWO and GWO. The CWO class calculates diffraction calcu- 
lations on a CPU, has auxiliary functions and allows us to cal- 
culate the following diffractions: 

1 . Fresnel diffraction(Convolution form) 

2. Fresnel diffraction(Fourier form) 

3. Shifted Fresnel diffraction 

4. Angular spectrum method 

5. Shifted angular spectrum method 

6. 3D diffraction calculation 

The first to fifth diff'ractions are primary diffraction calculations, 
on which the sixth diffraction calculation is based. The above 
diffraction calculations can also be calculated on a GPU using 
the GWO class. 

Table [T] shows the structure of the CWO-h-h library. The 
class "CWO" is the top class of the CWO-h-h library, while the 
other classes, which are "GWO", "cwoPLS" and "gwoPLS", 
are inherited from "CWO". The "cwoPLS" and "gwoPLS" 
classes are for PLS-based diffraction and CGH calculations on 
a CPU and GPU, respectively. 

"cwoComplex" and "cwoObj Point" are data structures and 
their auxiliary functions for complex number and object points 
for PLS, respectively and are distributed as open- source codes. 

2.1. Diffraction calculation 

Figure [T] shows a diff'raction calculation by monochromatic 
wave, whose wavelength is A, between a source plane (aperture 
function) ui(xi,yi) and a destination plane W2fe,}^2). 

The CWO-h-h library allows us to calculate FFT-based diffrac- 
tion calculations. In addition, diffraction calculations are cate- 
gorized into convolution and Fourier forms. The former cat- 
egory includes Fresnel diffraction (convolution form). Shifted 
Fresnel diffraction. Angular spectrum method and Shifted an- 
gular spectrum method. The latter category includes Fresnel 
diffraction (Fourier form). In the following subsections, we de- 
scribe these diffractions. 



Table 1: Classes of the CWO++ library. They are distributed as open-source codes. 



Class 


Role 


Parent class 


Related source files 


cwo 


Diffraction calculation 
on CPU 


None 


cwo.h 
cwo.pp 


gwo 


Diffraction calculation 
on GPU 


cwo 


gwo.h 
gwo.cpp 


cwoPLS 


PLS- based diffraction 

and CGH calculations 

on CPU 


cwo 


cwoPLS.h 
cwoPLS.cpp 


gwoPLS 


PLS -based diffraction 

and CGH calculations 

on GPU 


gwo 


gwoPLS.h 
gwoPLS.cpp 


cwoComplex 


Complex number 


None 


cwo.h 


cwoObjPoint 


PLS 


None 


cwo.h 


cwoVect 


Vector operations 


None 


cwo.h 



Destination plane y^ 



Source plane y^ length 




Figure 1: Diffraction calculation by monochromatic wave, whose wavelength is 
A, between a source plane (aperture function) ui(xi,yi) and a destination plane 

U2iX2,y2)- 



2.1.1. Fresnel diffraction (convolution form) 

The convolution form of Fresnel diffraction is expressed by: 



W2fe,j2) = 



exp(/^z) 
iAz 



I I ui(xu 



yi) 



(1) 



exp(/— ((X2 - xif + (y2 - yif))dxidyi 
Az 



The above equation is the convolution form, and can be ex- 
pressed relative to the following equation according to convo- 
lution theorem: 



;2;r. 



U2(X2,y2) ■■ 



expOfz) 
iAz 



r-' 



r 



u(xuyi) 



r 



hfixuyi) 



(2) 



where, the operators 9^[-] and T~^[-] indicate Fourier and in- 
verse Fourier transforms, respectively, hpix.y) is the impulse 
response function (also known as the point spread function) of 
Eq. ([T]) as follows. 



/z/.(x,j) = exp(/— (x^-h/)) 
Az 



(3) 



In the numerical implementation, we need to discretize each 
spatial variable and use FFT instead of Fourier transforms as 
follows: The discretizing space variables are (xi , ji) = (mi Axi , 
^1 Aji), where Axi and Aj^i are the sampling pitches and mi, ^i 
are integer indices on the source plane. The discretizing space 
variables are (x2, J2) = im^t^x^.n^l^yi), where Ax2 and A_y2 are 
the sampling pitches and m2, n^ are integer indices on the desti- 
nation plane. The ranges of integer indices are as follows: 



^x ^x 

-— < mi,m2 < — T- - 1, 

Ny ^y , 

-— < mi,m2 < — T- - 1 



(4) 



where, A^;^ and A^3; are the numbers of horizontal and vertical 
pixels on the source and destination planes, respectively. 

The discretized Fresnel diffraction of the convolution form 
is as follows: 



U2(m2,n2) = 



exp(/^z) , 

-FFT 



iAz 
FFT /z/7(mi,mi) 



FFT 



u(mi,mi) 



hf(m,n) = exp(/ — ((mAxi)^ -h (nAyif)) 
Az 



(5) 



(6) 



Note that the sampling pitches on the destination planes 
are the same as those on the source plane after the diff'raction, 
namely Ax2 = Axi and Ay 2 = Ayi. 

2.1.2. The Fresnel diffraction (Fourier form) 

We can obtain the Fourier form of the Fresnel diffraction to 
expand the quadratic term in Eq. (O. The form is expressed by: 



exp(/Y^) 71 J J 

u(x2,y2) = exp(?— (^2 + J2)) 

lAz Az 



n: 



In 
w'(xi, ji)exp(-/ — (X2X1 -\-y2Xi))dx\dyi, 
Az 



(7) 



where u\xi,yi) is defined as, 

(•^1 + yj) 

u(xuyi) = u(xuyi)exp(i7i ). 



(8) 



This form is rewritten by the Fourier transform. The dis- 
cretized Fourier form of Fresnel diffraction is expressed by: 



u{m2,n2) = 



exp(/^z) n ^ ^ 
exp(/— ((m2Ax2)^ + {n2^y2r)) 

lAz AZ (q\ 



FFT 



u'{m\,n\) 



Note that the sampHng spacing on the destination plane is 
scaled to Ax2 = Axi/(/lz) and Aj2 = Aji/(/lz). 

2.1.3. Shifted Fresnel diffraction 



Shifted-Fresnel diff'raction Il52l1 enables arbitrary sampling 
pitches to be set on the source and destination planes as well 
as a shift away from the propagation axis, which is referred as 
to off'-axis propagation. The equation is derived from a com- 
bination of Fourier form Fresnel diff'raction and scaled Fourier 



transform 115 311 . The same methods were proposed in somewhat 
diff'erent areas ||54l |55|] . The discretized shifted Fresnel diff'rac- 
tion is expressed by the following equations: 



U2[m2,n2] = CsFFT ^ 



FFT 



u\[mi,ni] 



FFT 



hs[mi,ni] 

(10) 



Cs = -^ exp(i— (xf + yi)) 

lAz AZ 

2.71 2 2 

exp(-i—(OxoPxoXi + Oy^pyji)) exp(-i7i(S xJn^ + Syn^)) 

AZ 

(11) 



u[[muni] = ui[muni]exp(i—(xQ -hjq)) 

AZ 

Qxp(-i27r(moS jcOjc, -\-noSyOy^)) 
exip(-i7i(S xinl + Synl)) 



h[m,n] = Q^xp(in(S xm -\- Syn )) 



(12) 



(13) 



where, (Oxq, Oy^) and (Ox^, Oy^) are the shift distances away 
from the propagation axis and S x,Sy,xo,yo,xi,yi are defined 
as follows: 



^ _ PxqPxi ^ _ PyoPyi 

Az Az 

xo = moPxo + Ox,, yo = nopy^ + Oy^, 
xi = mipx, + Ox,, yi = nipy^ + Oy^. 



(14) 



For more details, see Ref. Il52l1 . 



2.7.4. Angular spectrum method 

The angular spectrum method can be devised from the Helm- 
holtz equation and is suitable for computing diff'raction at short 
distance, which is impossible for Fresnel and Shifted Fresnel 
diff'ractions. The angular spectrum method is expressed by: 



u{x,y)= A(Ufy,0)HA(Ufy) 

%J %J —oo 

exp(/ 27i(fxX + fyy))dfxdfy. 



(15) 



where, fx and fy are spatial frequencies, A(fx,fy,0) is defined 
by A(fx,fy,0) = T[u(xuyi)], and HAifxJy) is the transfer 
function. 



HAifxJy) = ^xp(iz^k^-4nHfI^fy^)) 



(16) 



where, k = 2nl A is the wave-number. 

The discretizing frequencies are (fx,fy) = (miAfx,niAfy), 
where Afx and A/3; are the sampling pitches on the frequency 
domain. Therefore, the discretized angular spectrum method is 
expressed by: 



u(m, n) = FFT 



FFT 



u(m, n) 



HA(muni) 



(17) 



2.1.5. Shifted angular spectrum method 

The angular spectrum method calculates the exact solution 
in scalar light propagation because it is derived from the Helmholtz 
equation without using any approximation, but only calculates 
light propagation when close to a source plane due to the alias- 
ing problem. In addition, the angular spectrum method does not 
calculate off'-axis propagation. 

The shifted angular spectrum method 11561] method enables 
off'-axis propagation by applying a band-limited function to Eq. 
([T6l) . In addition, although the angular spectrum method, as 
mentioned, triggers an aliasing error when calculating a long 
propagation distance IdTIi . the shifted angular spectrum method 
overcomes the aliasing problem, making it an excellent means 
of performing diffraction calculations. The discretized shifted 
angular spectrum method is expressed by: 



u(m2, ^2) = FFT 



FFT 



u(mi,ni) 



HsAimuni) 



m\ — Cx ^i — Cy 
Recti — )Rect( ) 



Wv 



Wv 



(18) 



where, Hsa is the shifted transfer function of the same. 



HsAimuni) = HA(muni)Qxp(2n(0xmiAfx + OyniAfy)) 



(19) 



Two rectangle functions are band-limit functions with hori- 
zontal and vertical band- widths of Wx, Wy and shift amounts of 
Cx,Cy. See Ref. 11561] for the determination of these parameters. 



2.1.6. Summary of diffraction calculation 

As mentioned, many numerical diffraction calculations have 
been proposed to date, a classification of which is shown in Ta- 
ble [21 Each diff'raction must be used in terms of the number of 
FFTs (namely, which is proportional to the calculation time), 
required memory, propagation distance, sampling pitches and 
on- or off'-axis propagation. For example, if we calculate the 
diff'raction calculation with diff'erent sampling pitches on source 
and destination planes, we must use Shifted Fresnel diff'raction. 

Note that, in the diff'raction calculations of the convolution 
form, the area of the source and destination planes must be ex- 
panded from Nx X A^3; to 2Nx x 2Ny during the calculation, be- 
cause we avoid aliasing by circular convolution. After the cal- 
culation, we extract the exact diff'racted area on the destination 
plane with the size of NxX Ny. Therefore, we need FFTs and 
memories of size 2Nx x 2Ny during the calculation. 

2.2. CWO class: Simple example using the CWO-\-\- library of 
the Fresnel diffraction calculation on the CPU 

The CWO class provides diff'raction calculations and auxil- 
iary functions. We used the FFTW library [58] in the diff'raction 
calculations in the CWO class. 

We show the following source-code of the Fresnel diffrac- 
tion calculation on a CPU using the CWO-F-F library: 

Listing 1: Fresnel diffraction calculation on a CPU using the CWO class. 



1 CWO c; 

2 C.Load("lena512x512.bmp"); 

3 c.Diff'ract(0.2, CWO_FRESNEL_CONV); 

4 c.IntensityO; 

5 c.Scale(255); 

6 C.Save("lena512x512_dif fract .bmp"); 



In line 1, we define the instance "c" of the CWO class. In 
the next line, using the member function of "Load" in the CWO 
class, we store an original image (Fig|2ta)) of a bitmap file 
"lenaS 1 2 x5 1 2.bmp" with 512x512 pixels into the instance "c" . 
"Load" can also read jpeg tiff' formats, and so forth. See more 
details in Appendix B In line 3, the CWO member function 



"Diff'ract" calculates the diff'raction calculation according to the 
first and second arguments of the function, which are the propa- 
gation distance and type of diff'raction calculation, respectively. 
Here, we select the propagation distance of 0.2 m and Fresnel 
diff'raction of the convolution form (CWO_FRESNEL_CONV). 
The calculation results in a complex amplitude field. 

To observe the light intensity field of the complex amplitude 
field as image data, the initial step involves the CWO mem- 
ber function "Intensity" calculating the absolute square of the 
complex amplitude field in line 4. Namely, the calculation is 
expressed by: 



/(m2,^2) = 1^2(^2,^2)1 



(20) 



In the next, the CWO member function "Scale" converts 
the intensity field with a large dynamic range into that with 256 
steps. 

/(m2, ^2) - min{I(m2, ^2)} 



where, the operators max{-] and min{-] take the maximum and 
minimum values in the argument. Finally, we save the intensity 
field with 256 steps as a bitmap file "lena5 12x5 12_diff'ract.bmp". 
Of course, we can also save the intensity field in other image 
formats. Figure[2l(b) shows the diff'racted image. 

Note that the sample code does not explicitly indicate the 
wavelength and sampling pitches on the source and destina- 
tion planes. The default wavelength and sampling rates are 633 
nm and lOyum x lOyum. If we change the wavelength and sam- 
pling pitch, we can use the CWO member functions "SetWave- 
Length" and "SetPitch". 




^256(^2, ^2) = 



max{I(m2, n2)} - min{I(m2, ^2)} 



x255, (21) 



Figure 2: (a) Original image with 512 x 512 pixels, (b) Diffracted image with 
the wavelength of 633 nm and the sampling pitches of 10//m x 10//m on the 
source and destination planes. 



2.3. Three-dimensional diffraction calculations 

Certain methods for calculating a scalar 3D field were pro- 
posed iHlli]. In Ref. IH], the method can calculate a 3D 
diff'raction calculation in a low numerical aperture (NA) system 
using once 3D-FFT, so that the method is eff'ective in terms of 
computational cost and memory amount. In the CWO-h-h li- 
brary, we provide a calculation of a scalar 3D diff'raction field 
by stacking 2D diff'raction calculations as mentioned in a depth 
direction [34] . Figure [3] shows a scalar 3D diff'raction field 
by stacking 2D diff'raction calculations. Unlike 2D diff'raction 
calculations, the 3D diff'raction calculation requires a sampling 
pitch Az in a depth direction. 

List[2lshows the 3D diffraction calculation from a hologram, 
on which two small circles are recorded by the angular spec- 
trum method. In the calculation, the conditions are as follows: 
the number of pixels in 3D diffracted field is512x512x512, 
the horizontal and vertical sampling pitches are lOjim x lOyum, 
the sampling pitch in a depth direction Az is 1cm, the distance 
between the two small circles and the hologram is 10cm and 
the radii of the two small circles are 210/im. We need to set the 
large sampling pitch in a depth direction compared with the hor- 
izontal and vertical sampling pitches because the optical system 
has a low NA. 

In line 1, we prepare two instances "cl" and "c2" because 
we need to maintain a hologram and a 3D diff'racted field, in- 
dividually. In line 2, we load the hologram to the instance cl. 
In line 3, we allocate the memory of512x512x512 pixels 



Table 2: Classification of diffraction calculation. In the diffraction calculations of the convolution form, the area of the source and destination planes must be 
expanded from Nx x A^_y to 2Nx x 2Ny during the calculation, because we avoid aliasing by circular convolution. After the calculation, we extract the exact diffracted 
area on the destination plane with the size of N^x Ny. Therefore, we need FFTs and memories of size 2Nx x 2Ny during the calculation. 



Diffraction 


Propagation 
distance 


Sampling 

pitch on 

source 

plane 


Sampling 

pitch on 

destination 

plane 


Off-axis 
calculation 


Number 

of 

FFls 


Required 
memory 


Fresnel 

diffraction 

(Convolution form) 


Fresnel 


AxxAy 


AxxAy 


N.A. 


3 


2Nx2N 


Fresnel 

diffraction 

(Fourier form) 


Fresnel 


AxxAy 


Ax y Ay 


N.A. 


1 


NxN 


Shifted 

Fresnel 

diffraction 


Fresnel 


Arbitrary 


Arbitrary 


Available 


3 


2Nx2N 


Angular 
spectrum 
method 


Short distance 


AxxAy 


AxxAy 


N.A. 


2 


2Nx2N 


Shifted 

angular 

spectrum method 


All 


AxxAy 


AxxAy 


Available 


2 


2Nx2N 



for the 3D diffraction field. In line 4, we set horizontal, verti- 
cal and depth-direction sampling pitches, respectively. In line 
5, the 3D diffraction field is calculated by the member func- 
tion "Diffract3D". The first argument is set to the instance of 
the CWO class maintaining the hologram. The second argu- 
ment is the distance z as shown in Fig. [3]between the hologram 
and 3D diffracted field. The third argument concerns the type 
of diffraction calculation. In line 6, we save the calculated 3D 
diffraction field as a cwo file, which includes the 3D diffraction 
field in complex amplitude in binary form. Figured shows the 
volume rendering from the cwo file. In the left figure, we can 
see the two circles reconstructed head-on in the volume. In the 
right figure, we can see a different view of the reconstructed 
two circles. 

Destination planes 




Source plane y Wave 



Listing 2: 3D diffraction calculation based on the stack of 2D diffracted 


planes. 


1 


CWO cl,c2; 






2 


cl.Load( "hologram, bmp"); 




3 


c2.Create(512,512 


512); 




4 


c2.SetPitch(10e-6, 


lOe-6, 0.01); 




5 


c2.Diffract3D(al, - 


0.1,CWO_ANGULAR); 




6 


c2.Save("3d.cwo"); 








Figure 3: Scalar 3D diffraction field by stacking 2D diffraction calculations. 



Figure 4: Volume rendering of the 3D diffraction calculation by List|2] 

2.4. Calculations of a complex amplitude field and a computer 
generated hologram from point light sources 

Current CGH calculations are mainly categorized into two 
approaches: polygon-based liol |64J] and Point Light Sources 
(PLSs) approaches itzi Isl |6l|] . Polygon-based approaches are 
based on diffraction calculations using FFTs. In the subsection, 
we focus on the PLS approach, which treats a 3D object as a 
composite of multiple PLSs, and generates a CGH from a 3D 
object more flexibly than polygon-based approaches. 

The class "cwoPLS" calculates a complex amplitude field 
and CGH from PLSs. In Fig. [5l the calculations assume that a 
3D object is composed of PLSs with a number of N. 
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Figure 5: PLS-based diffraction and CGH calculations. 

We can calculate the complex amplitude field or CGH to 
superimpose PLS distributions from one PLS to the destination 
plane. The complex amplitude field u(x,y) from PLSs by Fres- 
nel diff'raction is expressed as, 



u(x,y) = 



exp(/^z) 



^A,exp(-( 



27i(x-Xj)^-\-(y-yj)^ 



2^7 



)) 



(22) 



where, (x,y) and (xj,yj,Zj) are coordinates on the destination 
plane and a 3D object, Aj is the light intensity of the 3D object. 
A CGH calculation based on Fresnel diff'raction is expressed as 



(23) 



n ^ X^A ,2n (x-Xjf^iy-yjf 



where, /(x, y) is the CGH pattern. The computational complex- 
ity of the above formulas are 0(NNxNy), where A^;^ and Ny are 
the horizontal and vertical pixel numbers of the CGH. 

List [3] shows CGH generation using the class "cwoPLS". 
We generate a 2, 048 x 2, 048 pixels CGH from the 3D object of 
a dinosaur with 1 1,646 points. The sampling pitch on the CGH, 
wavelength (default value) and distance between the CGH and 
the 3D object is 4jdm x 4yum, 633^m and 0.2 m, respectively. 
The class treats the 3D object as the source and the CGH as the 
destination plane. 

In line 4 of List[3l the member function "Create" allocates 
the required memory of the CGH. In line 5, the member func- 
tion "SetDstPitch" is set to the sampling pitches on the destina- 
tion plane, namely the CGH. In line 6, the first and second argu- 
ments of the member function "SetOff'set" are set to the off'sets 
of the 3D object, namely horizontal and vertical off'sets of 1 cm 
(2500 pixels) and -2 mm (-512 pixels) from the origin, respec- 
tively. In addition, the third argument sets the distance between 



CGH and a 3D object of 0.2 m. In line 7, the 3D object data of 
the dinosaur is read, while in the next line, the coordinates of 
the 3D object are normalized from -4 mm (-1000 pixels) to 4 
mm (1000 pixels). In line 9, the CGH is calculated by Eq. (1231) , 
and in the next line, the calculated CGH pattern is normalized 
to 256 steps by Eq. (|2T1) . 

Figure Oa) and (b) show the CGH generated by Listl3]and 
the reconstructed image from (a) using the CWO class. 

Listing 3: CGH generation using the class "cwoPLS". 



cwoPLS c; 

float p=4e-6; 

float z=0.2; 

c.Create(2048,2048); 

c.SetDstPitch(p,p); 

c.SetOff'set(2500>Hp, -512*p, z); 

c.Load("tyranno_000.3df "); 

c.ScalePoint(p>^1000); 

c.PLS(CWO_PLS_FRESNEL_CGH); 

c.Scale(255); 

c.Save("cgh.bmp"); 



3. GWO and gwoPLS classes: Diffraction and CGH calcu- 
lations on GPU 

The current CWO++ library provides diff'raction and CGH 
calculations on NVIDIA GPU chips by the GWO and gwoPLS 
classes. In this subsection, we briefly describe an NVIDIA 
GPU, and subsequently show the source code of Fresnel difl'rac- 
tion and CGH calculations on a GPU using the GWO and gwoP 
LS classes. 

Although initially GPU chips were mainly used for render- 
ing 3D computer graphics, recent GPU chips have also been 
used for numerical computation. In 2007, NVIDIA released a 
new GPU architecture and its software development environ- 
ment. Compute Unified Device Architecture (CUDA). CUDA 
can be used to facilitate the programming of numerical compu- 
tations more than software previously developed, such as HLSL, 
Cg language and so forth. Since its release, many papers using 
NVIDIA GPU and CUDA have beenpublished in optics. In 
particular, calculations of CGH 1^, '63', '6?, '65', '66^ and recon- 
struction in digital holography |44, 45, 49, 67, 68, 69, 70] have 
successfully accelerated these calculations. 

Figure [7] shows the structure of an NVIDIA GPU, featuring 
GPU chips with some Multi-Processors (MP). Moreover, the 
MP includes Stream Processors (SP), where the number of SPs 
per MP diff'ers depending on the version of the NVIDIA GPU 
chip. One SP can operate 32- or 64-bit floating-point addition, 
multiplication and multiply-add instructions. SPs in an MP op- 
erate the same instruction in parallel, and process diff'erent data. 
Namely, an MP is a Single Instruction Multiple Data (SIMD) 
fashion processor. In addition, each multiprocessor can operate 
the same or difl'erent processing, thus allowing the GPU chip to 
be used as a highly parallel processor. The specifications of the 
GPUs used in this paper are shown in Table [3l 




(a) 



(b) 



Figure 6: (a) CGH generated by the class cwoPLS. (b) Reconstructed image from (a) using the class CWO. 



Table 3: Specifications of the GPUs used in this paper. 





NVIDIA Geforce GTX 460M 


GeForce GTX295 (1 chip) 


GeForce GTX 580 


CUDA processors 


192 


240 


512 


Memory amout (GBytes) 


1.5 


0.896 


1.53 


Core clock(GHz) 


1.35 


1.24 


1.54 


Memory clock (GHz) 


1.25 


1 


2 


Memory band width (GByte /sec) 


60 


223.8 


192.4 



The host computer controls the GPU board and the commu- 
nication between the host computer and the GPU board via the 
PCI-express bus. The host computer can also directly access 
the device memory on the GPU board, which is used for stor- 
ing input data, namely the location of the source plane or 3D 
object, and the destination plane as computed by the GPU. The 
multiprocessor has a shared memory, which is limited, but low 
latency and faster than the device memory. In the classes, these 
memories are used for fast calculation. 

The CUDA compiler compiles a C-like language source 
code for the instruction set of the GPU chip, which is referred 
to as a "Kernel". We download this "Kernel" to the GPU chip 
via the PCI-express bus. The kernel codes in the CWO-h-h li- 
brary are collected in the form of a library and DLL files on 
Windows OS, named "gwoJib.lib" and "gwoJib.dll". When 
we use the GWO or gwoPLS classes, we need to link these files 



to our program. See more details in Appendix A 



The diff'raction calculations, as mentioned, require some 
FFT operations. The CUDA compiler allows us to accelerate 
the FFT algorithm on an NVIDIA GPU chip, which is named 
CUFFT. It is similar to the FFTW library pSf], and we use 
CUFFT for the GWO and gwoPLS classes. 

When the GWO and gwoPLS classes are used, they implic- 



itly execute the following steps, which are hidden to CWO-I--I- 
library users: 

1 . Allocating the required memories for calculation on the 
device memory. 

2. Sending the input data (source plane or 3D object data) 
to the allocated device memory. 

3. Executing kernel functions. The result (complex ampli- 
tude field or CGH on the destination plane) is stored in 
the allocated device memory. 

4. Receiving the result from the allocated device memory. 

5. Releasing the allocated device memories. 

3.1. Fresnel diffraction and CGH calculations on a GPU using 
the GWO and gwoPLS classes 

We show the source-code of a Fresnel diff'raction calcula- 
tion on a GPU using the GWO class. This example is the same 
as Section 12.21 except using GPU, but somewhat diff'erent in 
comparison to List[T] 

Listing 4: Fresnel diff'raction calculation on a GPU using class GWO. 



CWOc; 
GWOg; 

C.Load("lena512x512 .bmp"); 
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Figure 7: Structure of NVIDIA GPU chip. 

g.Send(c); 
g.Diffract(0.2,CWO_FRESNEL_CONV); 

g. Intensity 0; 

g.Scale(255); 

g.Recv(c); 

c.Save("lena512x512_ciiffract .bmp"); 



As the initial change, we define the instance "g" of the 
GWO class, in line 2. As the second change, the source plane, 
namely FigEJa), is sent to a GPU by using the GWO class 
member function "Send". When calling the member function 
"Send", the first step involves the automatic allocation of the re- 
quired device memory on the GPU to the instance "g", via the 
CUDA API function "cudaMalloc". Subsequently, the source 
plane on the host memory is sent to the device memory on the 
GPU using the CUDA API function "cudaMemcpy". 

In lines 5 to 7 of List IH we change the instance names in 
lines 3 to 5 of List [T] from the instance "c" of the GWO class 
to the instance "g" of the GWO class. Therefore, these func- 
tions calculate the diff'raction, intensity and normalization on 
the GPU. Finally, we receive the calculation result from the 
GPU to the host in line 8. 

Next, we show the source-code of CGH calculation on a 
GPU using the GWO class. This example is the same as Section 
12.41 except using GPU. There are also some changes as com- 
pared with ListO 

Listing 5: CGH calculation on a GPU using class gwoPLS. 



cwoPLS c; 

gwoPLS g; 

float p=4e-6; 

float z=0.2; 

c.Create(2048,2048); 

c.SetDstPitch(p,p); 

c.SetOff'set(2500>^p, -512*p, z); 



8 C.Load("tyranno_000.3df "); 

9 g.ScalePoint(p>^1000); 

10 g.Send(c); 

11 g.PLS(CWO_PLS_FRESNEL_CGH); 

12 g.Scale(255); 

13 g.Recv(c); 

14 c.Save("cgh.bmp"); 



4. Implementations of the C WO and GWO classes 

Now, we describe the implementations of the CWO and 
GWO classes. For simplicity, we select the implementations of 
the Fresnel diff'raction (Eq. dS])) on the CWO and GWO classes 
as an example. 

Figure [8] shows the implementations. Figure [8t a) shows a 
portion of the CWO member function "Diff'ract", the functions 
in which are defined as virtual functions of C-h-l-. The functions 
"FFT", "FresnelConvProp", "Mult", "IFFT" and "FresnelCon- 
vCoeff^' calculate FFT, the impulse response of Eq. ([3]), com- 
plex multiplication, inverse FFT and the coefficient of Eq. ([5]). 

When the function "Diffract" in the CWO class is called, 
the virtual functions defined in the CWO class are called (Fig. 
[Htb)). Moreover, these functions also call the functions defined 
in the library and DLL files of "cwoJib" (Fig. [Ud)), which are 
activated on a CPU. 

Conversely, when the function "Diffract" in the GWO class 
is called, the virtual functions defined in the GWO class are 
called (FiglHtc)). Moreover, these functions call those defined 
in the library and DLL files of "gwoJib" (FiglSe)). These func- 
tions are activated on a GPU. Therefore, the function "Diffract" 
in the GWO class calculates the Fresnel diff'raction on a GPU. 

The classes shown in the Table [T] are distributed as open- 
source code, while the functions in the library and DLL files 
of "cwoJib" and "gwoJib", which are closed source code, are 
distributed as binary files. 

The current CWO-h-h library is compatible with Intel and 
AMD x86 CPUs and NVIDIA CPUs. If we port the CWO-f-f 
library to other hardware, we need to inherit the CWO class to 
a new class indicating the new hardware, then re-define virtual 
functions and write functions corresponding to "gwoJib". 

For instance, new CPUs of the HD5000 and HD6000 series 
made by AMD were released. These GPUs have new archi- 
tecture and software environment. Open Computing Language 
(OpenCL). The architectures also differ from NVIDIA GPUs. 
We have already reported the CGH calculation of Eq. (l23l) 117 ill 
and Fresnel diffraction calculation ll72ll . Although we do not 
implement the CWO-h-h library on the AMD GPUs, we will 
make a new class for AMD GPUs using the above mentioned 
method. 



5. Field type 

The classes in the CWO-F-F library, namely the CWO, GWO, 
cwoPLS and gwoPLS classes, have field types indicating cur- 
rent fields. There are three field types: complex amplitude 



Open source 



Closed source 



Member function "Diffract" 
in class CWO 



_FFT(a,a,CW0_C2C); 

_FresnelConvProp(b); 

_FFT(b,b,CW0_C2C); 

_Mul(a, b, a); 

_IFFT(a,a); 

_FresnelConvCoeff(a); 

(a) 



class CWO 



virtual void FFT(...); 

virtual void FresnelConvProp(.. 

virtual void FFT(...); 

virtual void Mul(...); 

virtual void IFFT(...); 



virtual void FresnelConvCoeff(...); 

(b) 



class GWO 

void_FFT(...); 

void FresnelConvProp(...); 

void_FFT(...); 
" void_Mult(...); 
void_IFFT(...); 
void FresnelConvCoeff(...); 



(c) 



CWO lib 



cwoFFT( ); 

cwoFresnelConvProp( ); 

cwoFFT( ); 

cwoMultComplex( ); 

cwolFFT( ); 

cwoFresnelConvCoeff( ); 



(d) 



gwojib 

gwoFFT( ); 

gwoFresnelConvProp( ); 

gwoFFT( ); 

gwoMultComplex( ); 

gwolFFT( ); 

gwoFresnelConvCoeff( ); 



(e) 



Figure 8: Implementations of Fesnel diffraction in CWO and GWO. 



field (the predefined macro: CWO_FLD_COMPLEX ), inten- 
sity field (the predefined macro: CWO _FLD .INTENSITY ) and 
phase field (the predefined macro: CWO_FLD J^HASE). 

The above classes hold the field u(x, y) for the source or des- 
tination plane. If the field type is CWO_FLD_COMPLEX, the 
field u(x, y) maintains a complex amplitude field as a complex 
number array, 



u(x, y) = re(x, y) -\- i im(x, y) 



(24) 



where re(x,};) and im{x,y) indicate real and imaginary com- 
ponents of the complex value on {x,y). Because the current 
CWO-F-F library has single floating point precisions for the real 
and imaginary components, the memory amount for the field 
w(x, 3;) is ^NxNy bytes where A^;^ and A^3; are the number of pix- 
els in the field. 

If the field type is CWO_FLD_PHASE, the class maintains 
the phase field 0(x, y) as a real number array. 



0(x, y) = tan 



im(x, y) 
re(x, y) 



(25) 



If the field type is CWO_FLDJNTENSITY, the class main- 
tains a real number array a(x, y) except the phase field. The real 
number arrays include, for example, image data, amplitude, real 
or imaginary, only part of the complex amplitude field and so 
on. The memory required for the intensity and phase fields is 
4A^;,A^3; bytes. 

Figure [9] shows each field type and mutual conversions be- 
tween each field. We briefly describe the mutual conversions as 
follows: 

1. If we use the member functions "Re()" when the current 
field type is a complex amplitude field, the field is con- 
verted from this to a real part only and the field type is 
set to CWO_FLD_INTENSITY. 



Re() 
lm() 
AmpQ 
Intensity ( ) 
and so on 



O FLD INTENSITY) 




Phase ( ) 
Figure 9: Field types and mutual conversions between each field. 

2. If we use member functions "Phase()" when the current 
field type is a complex amplitude field, the field is con- 
verted from this to an argument of the same and the field 
type is set to CWO_FLD_PHASE. 

3. If we use member functions "ComplexO" when the cur- 
rent field type is an intensity field, the field is converted 
from this to a complex amplitude field according to u(x, y) 
= re(x,y) -\- i im(x, j), where re(x,};) = a{x,y) is the in- 
tensity field and im(x, y) is zero and the field type is set 
to CWO_FLD_COMPLEX. 

4. If we use member functions "ComplexO" when the cur- 
rent field type is a phase field, the field is converted from 
this to a complex amplitude field according to u{x, y) = 
re(x, y)-\-i im(x, y), where re(x, y) = cos(0(x, y)) and im(x, y) 
= sin(^(jc, y)) and the field type is set to CWO_FLD_COM 
PLEX. 

5. If we use member functions "Complex(CWO &a, CWO 
&b)" when the current field types of classes "a" and "b" 
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are the intensity and phase fields respectively, the fields 
are converted to a complex amplitude field according to 
u(x,y) = a(x,y)cos(6(x,y)) + / a(x,y)sin(0(x,y)), where 
"a" and "b" hold a(x,y) and 6(x,y), respectively and the 
field type is set to CWO_FLD .COMPLEX. 
6. If we use member functions "Complex(CWO &a, CWO 
&b)" when the current field types of classes "a" and "b" 
are both intensity fields, the fields are converted to a com- 
plex amplitude field according to u(x, y) = re(x, y) 
+/ im(x, y), where re(x, y) and im(x, y) are the fields of "a" 
and "b", respectively and the field type is set to CWO J^LD 
.COMPLEX. 

List[6]and Fig[TOlshow examples of mutual conversions be- 
tween each field type and their results, respectively. Figure [TOl 
(a) is an original image. We calculate the diff'racted light of the 
figure in lines 2 and 3 of List [6] because we observe the real 
part, imaginary part, amplitude and phase of the diff'racted field 
respectively. 

The real part of the diff'racted light (FiglTOtb)) is obtained 
in lines 8 to 10, and, the imaginary part of the diff'racted light 
(Fig. [I3c)) is obtained in lines 1 1 to 13. The amplitude 

JrQ(x, y)^ -\- im(x, y)^ 

of the diff'racted light (FigfTOld)) is obtained in lines 17 to 19, 
and, the phase of the diff'racted light (Fig. [TOte)) is obtained in 
lines 20 to 22. In lines 26 to 28, we show the generation of the 
complex amplitude field from the instances "a" and "b", which 
hold amplitude (CWO.FLD.INTENSITY) and phase (CWO.FL 
D.PHASE) respectively. In lines 29 to 32, we show the result 
of the back propagation result (FiglTOlf)) from the position of 
the complex amplitude field to that of the original image. In the 
result, although we observe a diff'raction eff'ect to some extent 
at the edges of the figure, the result is almost the same as the 
FigOa). 

Listing 6: Example of mutual conversions of field types. 



1 


CWO a,b,c; 


2 


a.Load("leiia512x512.bmp"); 


3 


a.Diff'ract(0.1,CWO.ANGULAR); 


4 
5 


b=a; 


6 


c=a; 


7 
8 


b.ReO; 


9 


b.Scale(255); 


10 


b.Save("re.bmp"); 


11 


c.ImO; 


12 


c.Scale(255); 


13 


c.Save("iin.bmp"); 


14 
15 


b=a; 


16 


c=a; 


17 


b.AmpO; 


18 


b.Scale(255); 


19 


b.Save("amp.bmp"); 



20 


c.PhaseO; 


21 


c.Scale(255); 


22 


c.Save("pliase .bmp"); 


23 
24 


b=a; 


25 


c=a; 


26 


b.AmpO; 


27 


C.PhaseO; 


28 


a.Complex(b,c); 


29 


a.Diff'ract(-0. 1 ,CWO.ANGULAR); 


30 


a.Re(); 


31 


a.Scale(255); 


32 


a. Save("c omplex.bmp"); 



6. Performance 

In this section, we show the calculation times of each diffrac- 
tion calculation and CGH calculation on an Intel CPU and NVID 
lA CPUs. We used an Intel CPU, which was a Core 17 740 
QM (with CPU clock frequency of 1.7GHz), and three CPUs, 
namely NVIDIA GeForce GTX 460M, GerForce GTX 295 and 
GeForce GTX580. The GPU specifications are shown in Table 

m 

In the diff'raction calculations, the impulse response and trans- 
fer functions of each diff'raction are only sufficient to calculate 
them once when the parameters, which are the propagation dis- 
tance, the sampling pitches, wavelength, off'sets and so on, are 
unchanged. Of course, we need to re-calculate the impulse re- 
sponse and transfer functions when the parameters are changed. 
Therefore, we evaluated the calculation times of each diff'rac- 
tion in both cases of re-calculation and once-calculation. 

For the evaluation, we used Lists [T] and |4] and changed the 
image size and diff'raction type. Table |4] shows the calculation 
times of diffraction calculations on the CPU and GPUs with 
recalculation of the impulse and transfer functions. Table [5] 
shows the calculation times of diff'raction calculations on the 
CPU and GPUs with the once-calculation of the impulse and 
transfer functions. In the table, except for the diff'raction type 
CWO.FRESNELJ^OURIER, we expand the area of the source 
and destination planes from A^;^ x Ny to 2Nx x 2Ny during the 
calculation and avoid aliasing by circular convolution. In the 
CPU calculations, we measured the time in line 3 of List [TJ In 
the GPU calculations, we measured the time in lines 4 and 5 of 
Listll 

Table[6l shows the calculation times of CGH calculations on 
the CPU and GPUs. For the evaluation, we used Lists [3] and [5] 
and changed the number of PLSs. In the CPU calculations, we 
measured the time in line 9 of List O In the GPU calculations, 
we measured the time in lines 10 and 1 1 of ListO 

The calculation times of each diff'raction calculation and 
CGH calculation on the GPUs were much faster than those of 
the CPU. 
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Table 4: Calculation times of each diffraction calculation on the CPU and CPUs with the re-calculation of the impulse and transfer functions. 



Resolution 


CPU (ms) 


GPU (ms) 


Intel Core 17 740QM 


GeForce GTX 460M 


GeForce GTX295 (1 chip) 


GeForce GTX580 


Fresnel diffraction convolution form (CWO.FRESNEL.CONV) 


512x512 


248 


15 


5 


3 


1024 X 1024 


1.24x10^ 


47 


15 


10 


2048 X 2048 


6.12x10-^ 


177 


67 


38 


Fresnel diffraction Fourier form (CWO_FRESNEL_FRESNEL) 


512x512 


51.7 


2.4 


1 


1 


1024 X 1024 


227 


6 


2 


2 


2048 X 2048 


984 


19 


9 


8 


Shifted Fresnel diffraction (CWO.SHIFTED_FRESNEL) 


512x512 


477 


15 


5 


3 


1024 X 1024 


2.07 X 10^ 


48 


16 


10 


2048 X 2048 


9.48 X 10^ 


186 


71 


40 


Angular spectrum method (CWO .ANGULAR) 


512x512 


260 


12 


3 


2 


1024 X 1024 


1.17x10^ 


36 


11 


8 


2048 X 2048 


5.56x10^ 


135 


47 


29 


Shifted angular spectrum method (CWO_SHIFTED J^lNGULAR) 


512x512 


269 


15 


4 


3 


1024 X 1024 


1.23x10^ 


44 


14 


9 


2048 X 2048 


5.66 X 10^ 


157 


58 


35 



Table 5: Calculation times of each diffraction calculation on the CPU and CPUs with the once-calculation of the impulse and transfer functions. 



Resolution 


CPU (ms) 


GPU (ms) 


Intel Core 17 740QM 


GeForce GTX 460M 


GeForce GTX295 (1 chip) 


GeForce GTX580 


Fresnel diffraction convolution form (CWO.FRESNEL.CONV) 


512x512 


117 


10 


3 


2 


1024 X 1024 


620 


29 


11 


6 


2048 X 2048 


3.30 X lO-' 


104 


48 


26 


Fresnel diffraction Fourier form (CWO JRESNEL JRESNEL) 


512x512 


52 


2 


1 


1 


1024 X 1024 


229 


5.4 


2 


2 


2048 X 2048 


993 


19 


9 


8 


Shifted Fresnel diffraction (CWO.SHIFTED JRESNEL) 


512x512 


346 


10 


3 


2 


1024 X 1024 


1.50x10^ 


30 


11 


7 


2048 X 2048 


6.91 X 10^ 


110 


51 


28 


Angular spectrum method (CWO.ANGULAR) 


512x512 


121 


9.5 


3 


2 


1024 X 1024 


624 


26 


10 


6 


2048 X 2048 


3.30 X 10^ 


97 


44 


24 


Shifted angular spectrum method (CWO_SHIFTED J^lNGULAR) 


512x512 


128 


10 


3 


2 


1024 X 1024 


665 


31 


12 


7 


2048 X 2048 


3.51 X 10^ 


119 


54 


29 
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J\J yii 





(d) 



(e) 



(f) 



Figure 10: Resluts of mutual conversions between each field, (a) original image (b) real part of the diffracted light (c) imaginary part of the diffracted light (d) 
amplitude of the diffracted light (e) phase of the diffracted light and (f) back propagation from the complex amplitude of (d) and (e). 

Table 6: Calculation times of CGH calculation on the CPU and CPUs 





CPU (ms) 


GPU (ms) 


Number of PLSs 


Intel Core i7 740QM 


GeForce GTX 460M 


GeForce GTX295 (1 chip) 


GeForce GTX580 


248 


1.3 xlO^ 


87 


67 


31 


4596 


2.2 X 10^ 


650 


562 


230 


11646 


5.9 X 10^ 


1.7x10^ 


1.43 X 10^ 


579 



7. Applications to holography 

In this section, we show some appHcations to holography 
using the CWO++ Hbrary and its performances on the CPU 
and GPUs. 

7.1. Inline phase-only CGH (Kinoform) 

In this subsection, we show an example of generating an 
inline phase-only CGH, also known as kinoform. A kinoform 
is calculated only by extracting the phase of a diffracted light 
onto the kinoform plane. List [7] shows the generation of an 
inline phase-only CGH with 512x512 pixels from the original 
image (Fig. [2](a)). 

In line 3, we add a random phase to the original image using 
the function "SetRandPhaseO" to spread the light. The function 
automatically sets to the complex amplitude field (CWO_FLD_C 
OMPLEX). The random phase is generated by Xorshift RNGs 
algorithm 11731] . In lines 4 and 5, we calculate the kinoform by 
diff'racting the original image at the propagation distance of 0. 1 
m, and subsequently extract the phase only from the diff'racted 
light, which is the kinoform. 

After line 7, these codes are for reconstruction from the ki- 
noform. In line 7, we convert the phase field (CWO_FLD .PHASE) 



to the complex amplitude field (CWO_FLD_COMPLEX). In 
lines 8 to 1 1 , we calculate the reconstructed image from the 
kinoform using the back propagation relative to the position of 
the original image. Figure [TT] (a) and (b) show the kinoform 
pattern and the reconstructed image from the kinoform. 

Listing 7: InHne phase-only CGH 



CWOc; 

C.Load("lena512x512 .bmp"); 

c.SetRandPhaseO; 

c.Diff'ractCO. 1 , CWO_FRESNEL_CONV); 

c.PhaseO; 

c.ComplexO; 

c.Diff'ractC-0. 1 , CWO_FRESNEL_CONV); 

c. Intensity 0; 

c.Scale(255); 

C.Save("kinof orin_reconst .bmp"); 



7.2. GS algorithm 

In the subsection, we implement the GS algorithm on the 
CPU and GPUs using the CWO-h-h library and show the per- 
formances of the algorithm on the CPU and GPUs. Although 
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(a) (b) 

Figure 11: (a) Kinoform and (b) reconstructed image from the kinoform. 

it is possible to obtain a complete reconstructed image from a 
complex amplitude field, unfortunately, we lack an appropri- 
ate electric device to display the amplitude and phase of the 
complex amplitude field simultaneously. Therefore, we need to 
select either the amplitude or the phase components of the com- 
plex amplitude field, which will cause the reconstructed image 
to deteriorate due to lack of information on the complex am- 
plitude field. We employ the GS algorithm as an iterative al- 
gorithm llisl [13] in order to improve the deterioration of the 
reconstructed image. 

Figure [121 shows typical a GS algorithm. In the GS algo- 
rithm for Fourier holograms, Fourier and inverse Fourier trans- 
forms correspond to reconstructions from a hologram and holo- 
gram generation, respectively. In the subsection, instead of 
Fourier and inverse Fourier transforms, we use the angular spec- 
trum method and the back propagation of the same. 

We start the iteration by adding a random phase to an input 
image, and calculate the diff'raction calculation from the latter. 
We extract only the phase components ("Phase constraints" in 
FigO from the diff'racted lights to generate a kinoform. The 
kinoforms are reconstructed by inverse diffraction calculation. 
We replace the amplitude of the reconstructed light with the 
original input image ("Amplitude constraint" in Fig[T2l). Re- 
peating the above processes, the GS algorithms gradually im- 
prove the quality of the reconstructed images. 

List [8] shows an example of the GS algorithm on a CPU. In 
lines 1 to 7, we load an input image, calculate its square root, set 
a random phase to it, calculate the diff'racted light to a kinoform 
plane, and subsequently generate a kinoform only by extracting 
the phase of the diff'racted light. The information of the original 
image is maintained in the instance "al", while instance "a2" is 
used for the forward and back propagations. 

In lines 8 to 17, we execute the iteration, the number of 
which is decided by "ite_num". In lines 9 and 11, we calculate 
the reconstructed image by the back propagation of the angular 
spectrum method at a propagation distance of -0. 1 m and extract 
only the phase information of the reconstructed light. In line 
13, we replace the amplitude of the reconstructed light with the 
original input image ("Amplitude constraint" in Fig[T2|, where 
instances "al" and "a2" hold the field type of CWO_FLD JNTE 
NSITY and CWO_FLD_PHASE, respectively. 



In lines 15 and 16, we recalculate a new kinoform from the 
new complex amplitude generated by line 13. Repeating the 
above processes, the GS algorithms gradually improve the qual- 
ity of the reconstructed images. In lines 19 to 23, we calculate 
a final reconstructed image from the kinoform. 

Listing 8: GS algorithm on CPU. 



1 


CWO al,a2; 


2 


al.Load("lena2048x2048 .bmp"); 


3 


al.SqrtO; 


4 


a2=al; 


5 


a2.SetRandPhase(); 


6 


a2.Diff'ract(0.1,CWO_ANGULAR); 


7 


a2.Phase(); 


8 


for(int i=0;i<ite_num;i-F-F){ 


9 


a2.Complex(); 


10 


a2.Diff'ract(-0. 1 ,CWO_ANGULAR); 


11 


a2.Phase(); 


12 
13 


a2.Complex(al,a2); 


14 
15 


a2.Diff'ract(0.1,CWO_ANGULAR); 


16 

17 


a2.Phase(); 
} 


18 
19 


a2.Complex(); 


20 


a2.Diff'ract(-0. 1 ,CWO_ANGULAR); 


21 


a2.Intensity(); 


22 


a2.Scale(255); 


23 


a2.Save("gs_oii_cpu.bmp"); 



List [9] shows an example of the GS algorithm on a GPU. 
The example is almost the same as to List [H The iteration of 
lines 1 1 to 20 is executed on a GPU, so that the example will be 
calculated faster than the CPU version of List[8l 

Listing 9: GS algorithm on GPU. 



1 


CW0cl,c2; 


2 


GW0gl,g2; 


3 


c 1 .Load( " Iena2048x2048 . bmp " ) ; 


4 


cl.SqrtO; 


5 


c2=cl; 


6 


c2.SetRandPhase(); 


7 


gl.Send(cl); 


8 


g2.Send(c2); 


9 


g2.Diff'ractO. 1 ,CWO_ANGULAR); 


10 


g2.Phase(); 


11 


for(int i=0;i<ite_num;i-h-F){ 


12 


g2.Complex(); 


13 


g2.Diff'ract(-0. 1 ,CWO_ANGULAR); 


14 


g2.Phase(); 


15 
16 


g2.Complex(gl,g2); 


17 
18 


g2.Diff'ract(0. 1 ,CWO_ANGULAR); 


19 


g2.Phase(); 


20 


} 
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Figure 12: GS algorithm in Fresnel region in order to improve the deterioration of a reconstructed image. 



g2.Complex(); 

g2.Diffract(-0. 1 ,CWO_ANGULAR); 

g2.Intensity(); 

g2.Scale(255); 

g2.Recv(cl); 

cl.Save("gs_on_gpu.bmp"); 



Changing the resolution of the input image and the iteration 
number, we compare the calculation times of the GS algorithm 
on the CPU and GPUs, which are shown in Table[71 In the CPU 
calculations, we measured the time in lines 3 to 22 of List[8l In 
the GPU calculations, we measured the time in lines 4 to 26 of 
List [9j The calculation times on GPUs were much faster than 
those on the CPU. 

Figures [13] (a), (b) and (c) show the reconstructed images 
when the resolution of the original image was 2, 048 x 2, 048 
pixels and the iteration numbers were 5, 20, and 40 respectively. 

8. Conclusion 

We developed the CWO++ library using the C++ class li- 
brary to calculate the 2D and 3D diffraction and CGH calcu- 
lations on CPU and GPU. Our previous C-language based li- 
brary, GWO, was not user- friendly because, for example, GWO 
library users have to manage the CPU and GPU memory al- 
location by themselves and so on. The CWO++ library re- 
mains user-friendly by concealing troublesome programming 
within classes and the GPU calculation power while unaware 
of the GPGPU technique. Applications capable of applying the 
CWO++ library cover a wide range of optics, ultrasonic and 
X-ray fields and so on. In this paper, applications to hologra- 
phy are shown. The CWO++ library will be distributed from 
[http : //brains . te . chiba-u . jp/~shiino/cwo/| 
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Appendix A. System requirements and installation 

System requirements for the CWO++ library are as follows: 

1. OS : for Windows XP(32/ 64 bit), 7 

2. CUDA : CUDA 4.0 (32 bit version) (if the GWO class or 
gwoPLS is used) 

The installation of the CWO++ library involves the follow- 
ing steps: 

1 . Create a project file of Visual C++. 

2. Ensure the following dll and library files are placed in 
your project directory: 

(a) cwo.dll, cwo.lib, libfftw3f-3.dll (libff'tw3 f-3.dll can 
be download from Ref. 115 811 ) 

(b) gwo.dll, gwo.lib (if you use GPU version of the 
CWO++ library) 

3. Set library files (*.lib) to your VISUAL C++ project. 

4. Set the following C++ and header files to your project: 

(a) cwo.h, cwo.cpp, cwoJib.h 

(b) gwo.h, gwo.cpp, gwoJib.h (if you use the GPU ver- 
sion of the CWO library) 
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Appendix B. Image formats 



Using CImag library 17411 . CWO library allows us to read 
and write image data in the following formats: 

1. Bitmap 

2. Jpeg 

3. Png 

4. Tiff 

If you read and write image formats other than bitmap format, 



you must install ImageMagik 117 511 on your computer. 
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